Thanks Kevin!
Your sharing about the works you guys have done at Airbnb would be a great
reference! We can get to know how scalable Airflow can be in a real-world
use case. Greatly helpful.
Best regards,
XD
On Sat, Sep 8, 2018 at 6:52 AM Ruiqin Yang wrote:
> Thank you Xiaodong for bringing
Thank you Xiaodong for bringing this up and pardon me for being late on
this thread. Sharing the setup within Airbnb and some ideas/progresses,
which should benefit people who's interested in this topic.
*- Setting-up*:
One-time on 1.8 with cherry-picks, planning to move to containerization
after
Yeah, we are seeing scheduler becoming bottleneck as number of DAG files
increase as scheduler can scale vertically and not horizontally.
We are trying with multiple independent airflow setup and are distributing the
load between them.
But managing these many airflow clusters is becoming a
Thanks for sharing, Raman.
Based on what you shared, I think there are two points that may be worth
further discussing/thinking.
*Scaling up (given thousands of DAGs):*
If you have thousands of DAGs, you may encounter longer scheduling latency
(actual start time minus planned start time).
For
Hi,
We have a requirement to scale to run 1000(s) concurrent dags. With celery
executor we observed that
Airflow worker gets stuck sometimes if connection to redis/mysql breaks
(https://github.com/celery/celery/issues/3932
https://github.com/celery/celery/issues/4457)
Currently we are using
I'll play!
Setting up: containerized (kubernetes/helm on GCP)
Executors: CeleryExecutor
Scale: 1 worker node
Queues: not using, we use pools very lightly to restrict some long running
tasks but mostly get away without it
Scheduler SLA: no
# of DAGs/Tasks: more on the scale of 10-15 DAGs with 3-20
Many thanks for sharing, Manu!
I realise I have missed an important question: how many DAGs/tasks are your
Airflow instance dealing with.
I would like to share the current status in my organisation as well:
*- Setting-up*: we're using both "one-time" and container setting-up ways,
in different
Hi Xiaodong,
Thanks for preparing the questions.
Setting-Up: In container (previously Swarm and now K8S)
Executor: CeleryExecutor
Scale: two airflow workers
Queue: No
SLA: We don't have a hard limit but it would be unbearable for a DAG to be
scheduled in more than one minute.
Airflow has been
I have no comment on your comments.
Just to make my questions clearer: SLA here means the internal service
agreement on how quickly a DAG should be processed by the scheduler or a
task be scheduled, in order to assess the performance. I'm not talking
about the SLA email notification feature in
Hi,
Setting up Airflow for the first time is a BIG DEAL.
unlike the initial intention of the community of easy install with SQLite and
SequentialExecutor - for actually working environment you need to change a lot
of settings. It doesn't help much that the demo install went smoothly.
The
Hi folks,
May you kindly share how your organization is setting up Airflow and using
it? Especially in terms of architecture. For example,
- *Setting-Up*: Do you install Airflow in a "one-time" fashion, or
containerization fashion?
- *Executor:* Which executor are you using (*LocalExecutor*,
11 matches
Mail list logo