Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-09 Thread Maxime Beauchemin
mple, with all the drawbacks > mentioned already. > > Get Outlook for Android<https://aka.ms/ghei36> > > > From: Maxime Beauchemin > Sent: Friday, November 9, 2018 5:03:02 PM > To: dev@airflow.incubator.apache.org > Cc: d...@airfl

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-09 Thread Daniel (Daniel Lamblin) [BDP - Seoul]
chemin Sent: Friday, November 9, 2018 5:03:02 PM To: dev@airflow.incubator.apache.org Cc: d...@airflow.apache.org; yrql...@gmail.com Subject: Re: A Naive Multi-Scheduler Architecture Experiment of Airflow [CAUTION]: This email originated from outside of the organization. Do not click links o

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-09 Thread Maxime Beauchemin
I mean at that point it's just as easy (or easier) to do things properly: get the scheduler subprocesses to take a lock on the DAG it's about to process, and release it when it's done. Add a lock timestamp and bit of logic to expire locks (to self heal if the process ever crashed and failed at rele

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-08 Thread Daniel (Daniel Lamblin) [BDP - Seoul]
Since you're discussing multi-scheduler trials, Based on v1.8 we have also tried something, based on passing in a regex to each scheduler; DAG file paths which match it are ignored. This required turning off some logic that deletes dag data for dags that are missing from the dagbag. It is pretty

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-05 Thread Deng Xiaodong
Thanks Devjyoti for your reply. To elaborate based on your inputs: - *When to add one more shard*: We have designed some metrics, like "how long the scheduler instance takes to parse & schedule all DAGs (in the subdir it’s taking care of)". When the metric is higher than a given threshold for

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-01 Thread Devjyoti Patra
>> 1. “Shard by # of files may not yield same load”: fully agree with you. This concern was also raised by other co-workers in my team. But given this is a preliminary trial, we didn’t consider this yet. One issue here is that when do you decide to add one more shard? I think if you monitor the ti

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-11-01 Thread Deng Xiaodong
Thanks Kelvin and Max for your inputs! To Kelvin’s questions: 1. “Shard by # of files may not yield same load”: fully agree with you. This concern was also raised by other co-workers in my team. But given this is a preliminary trial, we didn’t consider this yet. 2. We haven’t started to look int

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-10-31 Thread Maxime Beauchemin
A few related thoughts: * there may be hiccups around concurrency (pools, queues), though the worker should double-checks that the constraints are still met when firing the task, so in theory this should be ok * there may be more "misfires" meaning the task gets sent to the worker, but by the time

Re: A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-10-31 Thread Kevin Yang
Finally we start to talk about this seriously? Yeah! :D For your approach, a few thoughts: 1. Shard by # of files may not yield same load--even very different load since we may have some framework DAG file producing 500 DAG and take forever to parse. 2. I think Alex Guziel

A Naive Multi-Scheduler Architecture Experiment of Airflow

2018-10-31 Thread Deng Xiaodong
Hi Folks, Previously I initiated a discussion about the best practice of Airflow setting-up, and it was agreed by a few folks that scheduler may become one of the bottleneck component (we can only run one scheduler instance, can only scale vertically rather than horizontally, etc.). Especially