Re: Best Practice of Airflow Setting-Up & Usage

2018-09-07 Thread Deng Xiaodong
Thanks Kevin! Your sharing about the works you guys have done at Airbnb would be a great reference! We can get to know how scalable Airflow can be in a real-world use case. Greatly helpful. Best regards, XD On Sat, Sep 8, 2018 at 6:52 AM Ruiqin Yang wrote: > Thank you Xiaodong for bringing

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-07 Thread Ruiqin Yang
Thank you Xiaodong for bringing this up and pardon me for being late on this thread. Sharing the setup within Airbnb and some ideas/progresses, which should benefit people who's interested in this topic. *- Setting-up*: One-time on 1.8 with cherry-picks, planning to move to containerization after

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-07 Thread ramandumcs
Yeah, we are seeing scheduler becoming bottleneck as number of DAG files increase as scheduler can scale vertically and not horizontally. We are trying with multiple independent airflow setup and are distributing the load between them. But managing these many airflow clusters is becoming a

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-06 Thread Deng Xiaodong
Thanks for sharing, Raman. Based on what you shared, I think there are two points that may be worth further discussing/thinking. *Scaling up (given thousands of DAGs):* If you have thousands of DAGs, you may encounter longer scheduling latency (actual start time minus planned start time). For

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-06 Thread ramandumcs
Hi, We have a requirement to scale to run 1000(s) concurrent dags. With celery executor we observed that Airflow worker gets stuck sometimes if connection to redis/mysql breaks (https://github.com/celery/celery/issues/3932 https://github.com/celery/celery/issues/4457) Currently we are using

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-05 Thread Laura Lorenz
I'll play! Setting up: containerized (kubernetes/helm on GCP) Executors: CeleryExecutor Scale: 1 worker node Queues: not using, we use pools very lightly to restrict some long running tasks but mostly get away without it Scheduler SLA: no # of DAGs/Tasks: more on the scale of 10-15 DAGs with 3-20

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-05 Thread Deng Xiaodong
Many thanks for sharing, Manu! I realise I have missed an important question: how many DAGs/tasks are your Airflow instance dealing with. I would like to share the current status in my organisation as well: *- Setting-up*: we're using both "one-time" and container setting-up ways, in different

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-05 Thread Manu Zhang
Hi Xiaodong, Thanks for preparing the questions. Setting-Up: In container (previously Swarm and now K8S) Executor: CeleryExecutor Scale: two airflow workers Queue: No SLA: We don't have a hard limit but it would be unbearable for a DAG to be scheduled in more than one minute. Airflow has been

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-05 Thread Deng Xiaodong
I have no comment on your comments. Just to make my questions clearer: SLA here means the internal service agreement on how quickly a DAG should be processed by the scheduler or a task be scheduled, in order to assess the performance. I'm not talking about the SLA email notification feature in

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-05 Thread airflowuser
Hi, Setting up Airflow for the first time is a BIG DEAL. unlike the initial intention of the community of easy install with SQLite and SequentialExecutor - for actually working environment you need to change a lot of settings. It doesn't help much that the demo install went smoothly. The

Best Practice of Airflow Setting-Up & Usage

2018-09-05 Thread Deng Xiaodong
Hi folks, May you kindly share how your organization is setting up Airflow and using it? Especially in terms of architecture. For example, - *Setting-Up*: Do you install Airflow in a "one-time" fashion, or containerization fashion? - *Executor:* Which executor are you using (*LocalExecutor*,