Re: Airflow 1.10 Migration Duration

2018-09-25 Thread Ruiqin Yang
Thank you Taylor, the db-cleanup DAG is very nice! Got a question for you, should we expect the DB migration to be backward compatible, i.e. would 1.8.x cluster run fine with upgraded DB? Thank you! Kevin Y On Tue, Sep 25, 2018 at 6:14 PM Taylor Edmiston wrote: > I haven't done 1.8.x to 1.10.x

Re: Best Practice of Airflow Setting-Up & Usage

2018-09-07 Thread Ruiqin Yang
Thank you Xiaodong for bringing this up and pardon me for being late on this thread. Sharing the setup within Airbnb and some ideas/progresses, which should benefit people who's interested in this topic. *- Setting-up*: One-time on 1.8 with cherry-picks, planning to move to containerization after

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Ruiqin Yang
source the change. Cheers, Kevin Y On Wed, Aug 22, 2018 at 6:45 PM Taylor Edmiston wrote: > Kevin - Is there a Jira issue one can follow for this? > > On Wed, Aug 22, 2018 at 5:29 PM Ruiqin Yang wrote: > > > I'm working on spliting the DAG parsing manager to a subprocess and wit

Re: Why not mark inactive DAGs in the main scheduler loop?

2018-08-22 Thread Ruiqin Yang
I'm working on spliting the DAG parsing manager to a subprocess and with that we don't need to worry about scheduler doing non-supervisor stuff nor prolong scheduler loop duration. I can make a follow up PR to address this once I have the original PR published if you guys don't have plan to work

Re: Failover in apache 1.8.0

2018-07-20 Thread Ruiqin Yang
hin a > pre-configured interval beginning form the scheduled time of start. Isn't > there something similar in airflow? > > Regards > Shubham Gupta > > On Fri, Jul 20, 2018 at 1:42 PM Shubham Gupta > wrote: > > > Hi Ruiqin Yang, > > > > Can you please elaborate o

Re: Failover in apache 1.8.0

2018-07-20 Thread Ruiqin Yang
Hi Shubham, Worker running actual airflow task will regularly heartbeat, which updates the task instance entry in the DB. Scheduler will kill task instance w/o heartbeat for a long time, called zombie tasks, and if the task has retry left it will try to reschedule it( given all trigger rules are

[Proposal] Scale Airflow

2018-07-17 Thread Ruiqin Yang
Hi guys, I'd like to proposal a few improvements to Airflow that would help to scale Airflow: Scheduler: 1. - Problem: scheduler loop became slow when # of running task grows too large, thus slows down DAG parsing/scheduler loop and creates scheduling delay, AIRFLOW-2156

Re: DAG Level permissions (was Re: RBAC Update)

2018-07-16 Thread Ruiqin Yang
Congratulations! Extraordinary work! Thank you very much! This has been a highly desired feature for us for quite a while. Cheers, Kevin Yang Tao Feng 于2018年7月16日 周一下午9:30写道: > Hi, > > Just want to give an update that Airflow DAG level access just checked in >

Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-08 Thread Ruiqin Yang
> > On 2018/06/08 05:13:39, Ruiqin Yang wrote: > > Not sure about 1.9 but parallelism seems to be supported on master > > < > https://github.com/apache/incubator-airflow/blob/272952a9dce932cb2c648f82c9f9f2cafd572ff1/airflow/executors/base_executor.py#L113 > >

Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-07 Thread Ruiqin Yang
me details on airflow setup like > Airflow Version, Machine configuration, Airflow cfg settings etc.. > How can we configure infinity(0) for cluster-wide setting. (We are using > airflow v1.9 and it seems that > airflow cfg's parallelism = 0 is not supported in v1.9) > > On 2018/06/07

Re: Single Airflow Instance Vs Multiple Airflow Instance

2018-06-07 Thread Ruiqin Yang
Here to provide a datapoint from Airbnb--all users share the same cluster (~8k active DAGs and ~15k running tasks at peak). For the cluster-wide concurrency setting, we put infinity( 0) there and scale up on the # of workers if we need more worker slot. For the scheduler & Airflow UI coupling, I

Re: Log external module into airflow log

2018-06-01 Thread Ruiqin Yang
AFAIK, stdout of the task will be included in the task_instance log, you might get what you want by just printing it. Cheers, Kevin Y On Fri, Jun 1, 2018 at 12:20 PM Martin Gauthier wrote: > Here is the output i am getting from the log in airflow UI > > *** Reading local log. > [2018-06-01

Re: Disable Processing of DAG file

2018-05-29 Thread Ruiqin Yang
Hi folks, This config line controls how often the scheduler scan the DAG folder and tries to discover/ forget DAGs. For doing dag file processing part, scheduler does parse the DAG file

Re: Convert Dag Run from Backfill to Scheduled?

2018-05-25 Thread Ruiqin Yang
will work? Thanks! Is there any reason for me not to just > run a mass UPDATE on those dag runs directly in the metadata database? > > > > On May 25, 2018, 4:01 PM -0700, Ruiqin Yang <yrql...@gmail.com>, wrote: > > > Airflow is not going to schedule backfill DAG run

Re: Convert Dag Run from Backfill to Scheduled?

2018-05-25 Thread Ruiqin Yang
Airflow is not going to schedule backfill DAG runs, by looking at the dag run ID (which will start by 'backfill__'). If you want the scheduler to schedule those tasks, you can click the DAG run and edit its name back to 'scheduled__' Cheers, Kevin Y On Fri, May 25, 2018 at 3:53 PM, Scott Halgrim

Re: How Airflow import modules as it executes the tasks

2018-05-15 Thread Ruiqin Yang
operators. That said, my > understanding is if a module has already been imported, it's not loaded > again even if you try to import it again (and I reckon this is why in > Python Singleton is not commonly used). Is that right? > > On 2018/05/16 02:34:18, Ruiqin Yang <yrql...@gmail.

Re: How Airflow import modules as it executes the tasks

2018-05-15 Thread Ruiqin Yang
Not exactly answering your question but the reason db.py is loaded in each task might be because you have something like `import db` in each of your *.py file, and Airflow spun up one process to parse one *.py file, thus your db.py was loaded multiple time. I'm not sure how you can share the

Re: Apache Airflow welcome new committer/PMC member : Naik Kaxil (a.k.a. kaxil)

2018-05-09 Thread Ruiqin Yang
Wow, congrats! Kevin Y Taylor Edmiston 于2018年5月9日 周三上午10:58写道: > Congrats and welcome! > > *Taylor Edmiston* > Blog | Stack Overflow CV > | LinkedIn > | AngelList >

Re: How to consolidate log files?

2018-04-30 Thread Ruiqin Yang
AFAIK, airflow doesn't provide log in this way. Multiple tasks would run in different processes and potentially in parallel, thus writing to the same file at run time would produce log file with mix log lines from different tasks. Also I believe airflow now does not seperate stdour and stderr,

Re: About how to pause the running task

2018-04-26 Thread Ruiqin Yang
I don't think we have a way to pause but we can stop the running tasks by changing its state. Currently you can change the state of a running task in the UI by click on the task instance and mark it success/ clear it, change its state to failed/success in the CRUD, or clear it through the cli. I'm

Re: 1.10.0beta1 now available for download

2018-04-23 Thread Ruiqin Yang
Thank you Fokko and Bolke, it is very important progress and provides us almost immediate big value. Much appreciated! Cheers, Kevin Y On Mon, Apr 23, 2018 at 4:23 PM, Sumit Maheshwari wrote: > Great work Fokko and Bolke, really really appreciated!! > > On Mon, Apr 23,

Re: Bit confused about start_date and schedule_interval related to daily/weekly DAG

2018-04-18 Thread Ruiqin Yang
Hi Kyle, The execution_date of the DAG run will always be lagged one day for your daily DAG and one week for your weekly DAG. Under the hood, airflow will calculate the execution_date and next execution_date of the task, and only schedule the task when the current timestamp is bigger than the