Hi Eugene -
We run an Airflow deployment that executes ~6,000 hourly tasks across 32
dynamic DAGs - happy to jump on a call some time and talk about our setup,
which now includes a step to generate one file per dynamic DAG, so that each
DAG gets its own scheduler loop. Hit me up r...@astronomer
Wanted to ask for an advise...
Because I run 1300 task every 30 min in parallel such scenario creates a lot of
DB connections that freezes the UI.
I would like to segregate result_backend and Airflow DB.
Current setup:
broker_url = rabbitmq
result_backend = mysqldb
sql_alchemy_conn = mysqldb
N
Thank you, Daniel! Sorry for spamming.
https://stackoverflow.com/questions/57718658/airflow-celery-workers-too-many-mysql-connections
On 8/29/19, 1:14 PM, "Daniel Standish" wrote:
Eugene
Why don't you create a stack overflow post and give us the link?
That is probably
Eugene
Why don't you create a stack overflow post and give us the link?
That is probably a better way to help you through this.
We will need to see what exactly you are doing in your dag files and
potentially also hooks / operators.
Thanks
On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene
wrote
Can someone advise if this is expecting behavior, please?
- DB connections are not being re-used
- Connections stay open while active only 5:
mysql> show global status like 'Thread%';
+-+-+
| Variable_name
Hi Max,
We have ran few testing today from DB side and noticed that:
- DB connections are not being re-used
- Connections stay open while active only 5:
mysql> show global status like 'Thread%';
+-+-+
Celery executor.
12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box
Worker .cfg:
[core]
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 900
sql_alchemy_reconnect_timeout = 300
parallelism = 1200
dag_concurrency = 800
non_pooled_task_slot_count = 1200
max_active_runs_per_dag = 10
dagb
Delay between tasks could be due to not having enough worker slots. What
type of executor are you using, how is it configured?
Max
On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene
wrote:
> Absolutely possible, Daniel,
>
> We are looking in all directions. Has anyone noticed performance
> improveme
Absolutely possible, Daniel,
We are looking in all directions. Has anyone noticed performance improvements
with PostgreSQL vs MySQL ?
-Eugene
On 8/15/19, 2:03 PM, "Daniel Standish" wrote:
It's not just webserver and scheduler that will parse your dag file.
During the execution of
It's not just webserver and scheduler that will parse your dag file.
During the execution of a dag run, dag file will be re-parsed at the start
of every task instance. If you have 1000 tasks running in short period of
time, that's 1000 queries. It's possible these queries are piling up in a
queue
Thank you for your reply, Max
Dynamic DAGs query the database for tables and generates DAGs and tasks based
on the output.
For Python does not take much to execute:
Dynamic - 500 tasks:
time python PPAD_OIS_MASTER_IDI.py
[2019-08-15 12:57:48,522] {settings.py:174} INFO - setting.configure_orm(
What is your dynamic DAG doing? How long does it take to execute it just as
a python script (`time python mydag.py`)?
As an Airflow admin, people may want to lower the DAG parsing timeout
configuration key to force people to not do crazy thing in DAG module
scope. At some point at Airbnb we had so
Hello Airflow team,
Please advise if you can. In our environment, we have noticed that dynamic
tasks place quite of stress on scheduler, webserver and increase MySQL DB
connections.
We are run about 1000 Dynamic Tasks every 30 min and parsing time increases
from 5 to 65 sec with Runtime from 2s
13 matches
Mail list logo