Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-18 Thread yrqls21
On 2018/04/13 17:00:36, Maxime Beauchemin wrote: > If you're concerned about scheduler scalability I'd go with a bigger box. > The scheduler uses multiprocessing so more CPU power means more throughput. > > Also you may want to provision a beefy MySQL box to make

Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-13 Thread Maxime Beauchemin
If you're concerned about scheduler scalability I'd go with a bigger box. The scheduler uses multiprocessing so more CPU power means more throughput. Also you may want to provision a beefy MySQL box to make sure that doesn't become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds

Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-13 Thread ramandumcs
Thanks Ry, Just wondering if there is any approximate number on concurrent tasks a scheduler can run on say 16 GB RAM and 8 core machine. If its already been done that would be useful. We did some benchmarking with local executor and observed that each TaskInstance was taking ~100MB of memory so

Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-12 Thread Ry Walker
Hi Raman - First, we’d be happy to help you test this out with Airflow. Or you could do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker Engine + Docker Compose) to quickly spin up a test environment. Everything is hooked to Prometheus/Grafana to monitor how the system reacts to

Benchmarking of Airflow Scheduler with Celery Executor

2018-04-12 Thread ramandumcs
Hi All, We have requirement to run 10k(s) of concurrent tasks. We are exploring Airflow's Celery Executor for same. Horizontally Scaling of worker nodes seem possible but it can only have one active scheduler. So will Airflow scheduler be able to handle these many concurrent tasks. Is there any