Re: Airflow Dynamic tasks

2019-09-12 Thread Ry Walker
Hi Eugene - We run an Airflow deployment that executes ~6,000 hourly tasks across 32 dynamic DAGs - happy to jump on a call some time and talk about our setup, which now includes a step to generate one file per dynamic DAG, so that each DAG gets its own scheduler loop. Hit me up r...@astronomer

Re: Airflow Dynamic tasks

2019-09-12 Thread Bacal, Eugene
Wanted to ask for an advise... Because I run 1300 task every 30 min in parallel such scenario creates a lot of DB connections that freezes the UI. I would like to segregate result_backend and Airflow DB. Current setup: broker_url = rabbitmq result_backend = mysqldb sql_alchemy_conn = mysqldb N

Re: Airflow Dynamic tasks

2019-08-29 Thread Bacal, Eugene
Thank you, Daniel! Sorry for spamming. https://stackoverflow.com/questions/57718658/airflow-celery-workers-too-many-mysql-connections On 8/29/19, 1:14 PM, "Daniel Standish" wrote: Eugene Why don't you create a stack overflow post and give us the link? That is probably

Re: Airflow Dynamic tasks

2019-08-29 Thread Daniel Standish
Eugene Why don't you create a stack overflow post and give us the link? That is probably a better way to help you through this. We will need to see what exactly you are doing in your dag files and potentially also hooks / operators. Thanks On Thu, Aug 29, 2019 at 10:41 AM Bacal, Eugene wrote

Re: Airflow Dynamic tasks

2019-08-29 Thread Bacal, Eugene
Can someone advise if this is expecting behavior, please? - DB connections are not being re-used - Connections stay open while active only 5: mysql> show global status like 'Thread%'; +-+-+ | Variable_name

Re: Airflow Dynamic tasks

2019-08-21 Thread Bacal, Eugene
Hi Max, We have ran few testing today from DB side and noticed that: - DB connections are not being re-used - Connections stay open while active only 5: mysql> show global status like 'Thread%'; +-+-+

Re: Airflow Dynamic tasks

2019-08-20 Thread Bacal, Eugene
Celery executor. 12 BareMetals: CPU(s):40, MHz 2494.015, RAM 378G - each box Worker .cfg: [core] sql_alchemy_pool_size = 5 sql_alchemy_pool_recycle = 900 sql_alchemy_reconnect_timeout = 300 parallelism = 1200 dag_concurrency = 800 non_pooled_task_slot_count = 1200 max_active_runs_per_dag = 10 dagb

Re: Airflow Dynamic tasks

2019-08-20 Thread Maxime Beauchemin
Delay between tasks could be due to not having enough worker slots. What type of executor are you using, how is it configured? Max On Tue, Aug 20, 2019 at 7:50 AM Bacal, Eugene wrote: > Absolutely possible, Daniel, > > We are looking in all directions. Has anyone noticed performance > improveme

Re: Airflow Dynamic tasks

2019-08-20 Thread Bacal, Eugene
Absolutely possible, Daniel, We are looking in all directions. Has anyone noticed performance improvements with PostgreSQL vs MySQL ? -Eugene On 8/15/19, 2:03 PM, "Daniel Standish" wrote: It's not just webserver and scheduler that will parse your dag file. During the execution of

Re: Airflow Dynamic tasks

2019-08-15 Thread Daniel Standish
It's not just webserver and scheduler that will parse your dag file. During the execution of a dag run, dag file will be re-parsed at the start of every task instance. If you have 1000 tasks running in short period of time, that's 1000 queries. It's possible these queries are piling up in a queue

Re: Airflow Dynamic tasks

2019-08-15 Thread Bacal, Eugene
Thank you for your reply, Max Dynamic DAGs query the database for tables and generates DAGs and tasks based on the output. For Python does not take much to execute: Dynamic - 500 tasks: time python PPAD_OIS_MASTER_IDI.py [2019-08-15 12:57:48,522] {settings.py:174} INFO - setting.configure_orm(

Re: Airflow Dynamic tasks

2019-08-15 Thread Maxime Beauchemin
What is your dynamic DAG doing? How long does it take to execute it just as a python script (`time python mydag.py`)? As an Airflow admin, people may want to lower the DAG parsing timeout configuration key to force people to not do crazy thing in DAG module scope. At some point at Airbnb we had so

Airflow Dynamic tasks

2019-08-14 Thread Bacal, Eugene
Hello Airflow team, Please advise if you can. In our environment, we have noticed that dynamic tasks place quite of stress on scheduler, webserver and increase MySQL DB connections. We are run about 1000 Dynamic Tasks every 30 min and parsing time increases from 5 to 65 sec with Runtime from 2s